The site https://www.mathgenealogy.org/, contains over 276,000 observations of Mathematics PhD grads and their supervisors. This is effectively a geneology of mathematical supervision (which should have some sizable effect on thinking, topics, and reading). The R package ggenealogy contains an example dataset from this source and facilitates the consumption and ploting of this type of data.
Given that my thesis was just certified I want to try to see if I can trace up the mathematical genealogy tree to visualize my thought-leading predecessors.
library(ggenealogy)
library(ggplot2)
library(magrittr)
data("statGeneal", package = "ggenealogy")
df <- statGeneal %>%
#dplyr::filter(parent != "") %>%
tibble::as_tibble()
print(df, n=3)
## # A tibble: 8,165 x 6
## child parent gradYear country
## <chr> <chr> <dbl> <chr>
## 1 Nicolas Chopin "Christian Robert" 2003 France
## 2 Melvin Springer "Everett Welker" 1947 UnitedStates
## 3 Shelemyahu Zacks "" 1962 UnitedStates
## school
## <chr>
## 1 Université Pierre-et-Marie-Curie - Paris VI
## 2 University of Illinois at Urbana-Champaign
## 3 Columbia University
## thesis
## <chr>
## 1 Applications of Sequential Monte Carlo methods to Bayesian Statistics
## 2 Joint Sampling Distribution of Mean and Standard Deviation for a Chi-square U~
## 3 Optimal Strategies in Randomized Factorial Experiments
## # ... with 8,162 more rows
hist(df$gradYear)
Ok, about 8k observations where “all the parent-child relationships where both parent and child received an advanced degree of statistics as of June 6, 2015.” This may or may-not contain the need people I am looking for.
Note that grad year:
Through trial and error I know that Di Cook is not in the data. The original paper does have Thomas Lumley, another professor of interest. But perhaps first I will manual look up Cook’s genealogy.
Di, Di’s supersivor, and “grand-supervisor” are not in the list, may have to go to plan B, looking at Thomas Lumley. After looking at both parents and children, I know that Thomas has 1 child in the data; Petra Buzkova. From the paper, we can see that the oldest predescor is David Cox.
lumley_p <- grepl("Lumley", df$parent, fixed = TRUE)
sum(lumley_p)
## [1] 1
df[lumley_p, ]
## # A tibble: 1 x 6
## child parent gradYear country school
## <chr> <chr> <dbl> <chr> <chr>
## 1 Petra Buzkova Thomas Lumley 2004 UnitedStates University of Washington
## thesis
## <chr>
## 1 Marginal Regression Analysis of Longitudinal Data with Irregular, Biased Samp~
## Prep the network info, more on this in `As network layout (iGraph)`.
ig <- dfToIG(df)
Let’s grab the paths while we are on the topic of names. Actually, if we go all the way to Buzkova, this is the example case in the paper.
pathCB <- getPath("David Cox", "Petra Buzkova", ig, df,
"gradYear", isDirected = FALSE)
plotPath(pathCB, df, "gradYear", fontFace = 4) +
xlab("Graduation Year") +
theme(axis.text = element_text(size = 10),
axis.title = element_text(size = 10)) +
scale_x_continuous(expand = c(0.1, 0.2))
Good, we have a start. We will want to find a way to traverse the hierarchy to find all of the ancestors without filling in the cousin nodes (or more preferably faintly filling them in). As an example poster, see https://www.mathgenealogy.org/posters/raich.pdf
l <- plotAncDes("David Cox", df, mAnc = 1, mDes = 6, vCol = "blue") +
labs(subtitle = "Interesting, but too many \n cousins of Thomas Lumley")
r <- plotAncDes("Thomas Lumley", df, mAnc = 6, mDes = 1, vCol = "blue") +
labs(subtitle = "Not very interesting, \n nb only 1:1 relationships")
library(patchwork)
l + r
plotPathOnAll(pathCB, df, ig, "gradYear",
bin = 200, nodeSize = 1, pathNodeSize = 2.5,
nodeCol = "darkgray", edgeCol = "lightgray",
animate = TRUE) ## plotly static interaction not animated.
ig <- dfToIG(df)
class(ig)
## [1] "igraph"
ig
## IGRAPH 8e8f3b0 UNW- 7123 8165 --
## + attr: name (v/c), weight (e/n)
## + edges from 8e8f3b0 (vertex names):
## [1] Nicolas Chopin --Christian Robert Melvin Springer --Everett Welker
## [3] Shelemyahu Zacks -- James Sweeder --
## [5] Nino Kordzakhia -- Pavel Vanecek --Zuzana Prášková
## [7] Shyamal De -- Thomas Willke --
## [9] Vasant Huzurbazar-- Rita Engelhardt --William Cumberland
## [11] Fred Andrews -- Arthur Albert --
## [13] John Folks -- Arnold Goodman --
## [15] William Pruitt -- Thomas Birkner --
## + ... omitted several edges
getBasicStatistics(ig)
## $isConnected
## [1] TRUE
##
## $numComponents
## [1] 1
##
## $avePathLength
## [1] 2.801
##
## $graphDiameter
## [1] 10
##
## $numNodes
## [1] 7123
##
## $numEdges
## [1] 8165
##
## $logN
## [1] 8.871
plot(ig)
## Packages used
pkgs <- c("ggenealogy", "ggplot2")
## Package & session info
devtools::session_info(pkgs)
## - Session info ---------------------------------------------------------------
## setting value
## version R version 4.1.2 (2021-11-01)
## os Windows 10 x64 (build 19044)
## system x86_64, mingw32
## ui RTerm
## language (EN)
## collate English_United States.1252
## ctype English_United States.1252
## tz Australia/Sydney
## date 2022-06-09
## pandoc 2.11.4 @ C:/Program Files/RStudio/bin/pandoc/ (via rmarkdown)
##
## - Packages -------------------------------------------------------------------
## package * version date (UTC) lib source
## askpass 1.1 2019-01-13 [1] CRAN (R 4.1.2)
## base64enc 0.1-3 2015-07-28 [1] CRAN (R 4.1.1)
## cli 3.3.0 2022-04-25 [1] CRAN (R 4.1.3)
## colorspace 2.0-3 2022-02-21 [1] CRAN (R 4.1.2)
## cpp11 0.4.2 2021-11-30 [1] CRAN (R 4.1.2)
## crayon 1.5.1 2022-03-26 [1] CRAN (R 4.1.3)
## crosstalk 1.2.0 2021-11-04 [1] CRAN (R 4.1.2)
## curl 4.3.2 2021-06-23 [1] CRAN (R 4.1.2)
## data.table 1.14.2 2021-09-27 [1] CRAN (R 4.1.2)
## digest 0.6.29 2021-12-01 [1] CRAN (R 4.1.2)
## dplyr 1.0.9 2022-04-28 [1] CRAN (R 4.1.3)
## ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.0.5)
## fansi 1.0.3 2022-03-24 [1] CRAN (R 4.1.3)
## farver 2.1.0 2021-02-28 [1] CRAN (R 4.1.2)
## fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.2)
## generics 0.1.2 2022-01-31 [1] CRAN (R 4.1.2)
## ggenealogy * 1.0.1 2020-03-04 [1] CRAN (R 4.1.3)
## ggplot2 * 3.3.6 2022-05-03 [1] CRAN (R 4.1.3)
## glue 1.6.2 2022-02-24 [1] CRAN (R 4.1.2)
## gtable 0.3.0 2019-03-25 [1] CRAN (R 4.1.1)
## htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.1)
## htmlwidgets 1.5.4 2021-09-08 [1] CRAN (R 4.1.2)
## httr 1.4.3 2022-05-04 [1] CRAN (R 4.1.3)
## igraph 1.3.1 2022-04-20 [1] CRAN (R 4.1.3)
## isoband 0.2.5 2021-07-13 [1] CRAN (R 4.1.2)
## jsonlite 1.8.0 2022-02-22 [1] CRAN (R 4.1.3)
## labeling 0.4.2 2020-10-20 [1] CRAN (R 4.1.1)
## later 1.3.0 2021-08-18 [1] CRAN (R 4.1.2)
## lattice 0.20-45 2021-09-22 [1] CRAN (R 4.1.3)
## lazyeval 0.2.2 2019-03-15 [1] CRAN (R 4.1.2)
## lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.2)
## magrittr * 2.0.3 2022-03-30 [1] CRAN (R 4.1.3)
## MASS 7.3-57 2022-04-22 [1] CRAN (R 4.1.3)
## Matrix 1.4-1 2022-03-23 [1] CRAN (R 4.1.3)
## mgcv 1.8-40 2022-03-29 [1] CRAN (R 4.1.3)
## mime 0.12 2021-09-28 [1] CRAN (R 4.1.1)
## munsell 0.5.0 2018-06-12 [1] CRAN (R 4.1.1)
## nlme 3.1-157 2022-03-25 [1] CRAN (R 4.1.3)
## openssl 2.0.2 2022-05-24 [1] CRAN (R 4.1.3)
## pillar 1.7.0 2022-02-01 [1] CRAN (R 4.1.2)
## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.2)
## plotly 4.10.0 2021-10-09 [1] CRAN (R 4.1.2)
## plyr 1.8.7 2022-03-24 [1] CRAN (R 4.1.3)
## promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.1.2)
## purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.3)
## R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.1)
## RColorBrewer 1.1-3 2022-04-03 [1] CRAN (R 4.1.3)
## Rcpp 1.0.8.3 2022-03-17 [1] CRAN (R 4.1.3)
## reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.1.2)
## rlang 1.0.2 2022-03-04 [1] CRAN (R 4.1.3)
## scales 1.2.0 2022-04-13 [1] CRAN (R 4.1.3)
## stringi 1.7.6 2021-11-29 [1] CRAN (R 4.1.2)
## stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.2)
## sys 3.4 2020-07-23 [1] CRAN (R 4.1.2)
## tibble 3.1.7 2022-05-03 [1] CRAN (R 4.1.3)
## tidyr 1.2.0 2022-02-01 [1] CRAN (R 4.1.2)
## tidyselect 1.1.2 2022-02-21 [1] CRAN (R 4.1.2)
## utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.2)
## vctrs 0.4.1 2022-04-13 [1] CRAN (R 4.1.3)
## viridisLite 0.4.0 2021-04-13 [1] CRAN (R 4.1.2)
## withr 2.5.0 2022-03-03 [1] CRAN (R 4.1.2)
## yaml 2.3.5 2022-02-21 [1] CRAN (R 4.1.2)
##
## [1] C:/Users/spyri/Documents/R/win-library/4.1
## [2] C:/Program Files/R/R-4.1.2/library
##
## ------------------------------------------------------------------------------